User simulation for spoken dialogue systems: learning and evaluation
نویسندگان
چکیده
We propose the “advanced” n-grams as a new technique for simulating user behaviour in spoken dialogue systems, and we compare it with two methods used in our prior work, i.e. linear feature combination and “normal” n-grams. All methods operate on the intention level and can incorporate speech recognition and understanding errors. In the linear feature combination model user actions (lists of 〈 speech act, task 〉 pairs) are selected, based on features of the current dialogue state which encodes the whole history of the dialogue. The user simulation based on “normal” n-grams treats a dialogue as a sequence of lists of 〈 speech act, task 〉 pairs. Here the length of the history considered is restricted by the order of the n-gram. The “advanced” n-grams are a variation of the normal ngrams, where user actions are conditioned not only on speech acts and tasks but also on the current status of the tasks, i.e. whether the information needed by the application (in our case flight booking) has been provided and confirmed by the user. This captures elements of goal-directed user behaviour. All models were trained and evaluated on the COMMUNICATOR corpus, to which we added annotations for user actions and dialogue context. We then evaluate how closely the synthetic responses resemble the real user responses by comparing the user response generated by each user simulation model in a given dialogue context (taken from the annotated corpus) with the actual user response. We propose the expected accuracy, expected precision, and expected recall evaluation metrics as opposed to standard precision and recall used in prior work. We also discuss why they are more appropriate metrics for evaluating user simulation models compared to their standard counterparts. The advanced n-grams produce higher scores than the normal n-grams for small values of n, which proves their strength when little amount of data is available to train larger ngrams. The linear model produces the best expected accuracy but with respect to expected precision and expected recall it is outperformed by the large n-grams even though it is trained using more information. As a task-based evaluation, we also run each of the user simulation models against a system policy trained on the same corpus. Here the linear feature combination model outperforms the other methods and the advanced n-grams outperform the normal ngrams for all values of n, which again shows their potential. We also calculate the perplexity of the different user models.
منابع مشابه
A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies
Within the broad field of spoken dialogue systems, the application of machine-learning approaches to dialogue management strategy design is a rapidly growing research area. The main motivation is the hope of building systems that learn through trial-and-error interaction what constitutes a good dialogue strategy. Training of such systems could in theory be done using human users or using corpor...
متن کاملEmpirical Evaluation of a Reinforcement Learning Spoken Dialogue System
We report on the design, construction and empirical evaluation of a large-scale spoken dialogue system that optimizes its performance via reinforcement learning on human user dialogue data.
متن کاملEvaluation of a hierarchical reinforcement learning spoken dialogue system
We describe an evaluation of spoken dialogue strategies designed using hierarchical reinforcement learning agents. The dialogue strategies were learnt in a simulated environment and tested in a laboratory setting with 32 users. These dialogues were used to evaluate three types of machine dialogue behaviour: hand-coded, fully-learnt and semi-learnt. These experiments also served to evaluate the ...
متن کاملAdaptive Information Presentation for Spoken Dialogue Systems: Evaluation with real users
We present evaluation results with human subjects for a novel data-driven approach to Natural Language Generation in spoken dialogue systems. We evaluate a trained Information Presentation (IP) strategy in a deployed tourist-information spoken dialogue system. The IP problem is formulated as statistical decision making under uncertainty using Reinforcement Learning, where both content planning ...
متن کاملHierarchical Reinforcement Learning for Spoken Dialogue Systems
This thesis focuses on the problem of scalable optimization of dialogue behaviour in speech-based conversational systems using reinforcement learning. Most previous investigations in dialogue strategy learning have proposed flat reinforcement learning methods, which are more suitable for small-scale spoken dialogue systems. This research formulates the problem in terms of Semi-Markov Decision P...
متن کاملOn-Line Learning of a Persian Spoken Dialogue System Using Real Training Data
The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...
متن کامل